The Patient Protection and Affordable Care Act

As of December 2013, it is abundantly clear that the Affordable Care Act (ACA) is a remarkably contentious issue from a political standpoint. As with any political controversy, the debate surrounding this effort at health care reform is heavily distorted by misleading information, which leads many to debate fictitious, irrelevant, or less than significant aspects of the law. To quote the venerable President Fowler from Sum of All Fears (yes, my family and I just watched this movie over the Christmas break), there is "too much bullshit, and not enough facts."

What follows seeks to provide a modest primer on the ACA (a.k.a. Obamacare), fortified by data. Inevitably, it will be colored by the materials to which I have been exposed, but I am open to addendum or amendment should the need arise. In particular, the following components will be addressed:

My motivation for writing this script admittedly extends beyond just collecting these thoughts in one place. Given the need to integrate spatially-referenced categorical and interval data, this Notebook provides an opportunity to work with some very exciting libraries in Python.

This Notebook is intended to serve those who care only about the content and those who want to see the nuts an bolts of working with the data. Consequently, commentary about coding choices and approaches will be entirely contained within the code cells. Doing so permits the creation of a document that hides all coding elements.

In [38]:
'''The first step is loading in the toolset environment.  We will also set some globale parameters'''

#Load relevant libraries
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import geopandas as gp
import vincent as vt
from vincent import *
import pandas.io.data as web
from IPython.display import HTML
import pysal as ps
import seaborn

#Set print width
pd.set_option('line_width',100)

Information Sources & Data

This primer draws upon four primary sources that the author feels capture a great deal of useful information about the subject. This set of sources is not intended to be exhaustive (remember, this is a limited exploration of the subject), nor will it capture all aspects of this debate. This set is, however, achievably digested in a relatively short period of time. Consequently, it is easier for one to check on the elements discussed in this Notebook. Furthermore, the last two papers are reflect very recent thinking on the subject, over three and a half years after the passage of the ACA. The sources are as follows:

  1. Assuring Affordable Health Care for All Americans (Butler, 1989)
  2. The Impacts of the Affordable Care Act: How Reasonable are the Projections? (Gruber, 2011)
  3. How Obamacare Will Re-Shape the Practice of Medicine (Gottlieb, 2013)
  4. The Affordable Care Act: A User's Guide to Implementation (Burke & Kamarck, 2013)

The data displayed in figures below will be taken largely from three sources:

  1. The tables compiled in the appendix of (Burke & Kamarck)
  2. Medicare Provider Charge data released by Centers for Medicare & Medicaid Services
  3. The FRED database maintained by the St. Louis FED

The Role of Insurance

To understand the ACA, one must have a basic idea of the insurance concept. Insurance is a means of transferring risk from beneficiaries (e.g. individuals like you or I) to insurance providers (e.g. Blue Cross Blue Shield). In a very basic sense, a pure, profitless insurance transaction relies upon the concept of expected value to match the value of a stream of "small" recurrent payments to the probabalistic cost of a "big" payment.

\[\sum^t s_t = E(B) = \int_{-\infty}^{\infty}b f(b) db = \sum^i p_i b_i\]

where \(s_t\) = small payment at time \(t\) and \(b_i\) = big payment associated with event \(i\).

Let's unpack this a bit to see what's happening. The small payment term, \(\sum^t s_t\), represents the periodic payments that an individual would make to an insurance provider. They are called insurance premia. The payment periods are plan-specific and may be monthly, quarterly, etc.

The expected loss term, \(E(B)\), is the expectation of costs due to big events (e.g. heart attack) that occur during the insurance term. The last two terms, \(\int_{-\infty}^{\infty}b f(b) db\) & \(\sum^i p_i b_i\) are the continuous and discrete methods of calculating the expected loss term. Both effectively say that the expected loss associated with a big event (\(E(B)\)) is equal to the cost of that event (\(b\)) multiplied by the probability of it's occurrence (\(p\)). The summation just makes sure we include all relevant big events.

So, for example, if a heart attack cost $10,000, and the probability of an individual having one is 10% over the insurance term, the expected value is $1,000. This expected value is the liability incurred by an insurance provider if they take on this hypothetical individual. To be made whole, the provider must receive a stream of insurance premium payments from the individual that sum to $1,000 over the insurance term.

Everyone has some exposure to the insurance concept, so why bother to break it down? The reason is to highlight what it is not. Insurance is purely a risk transferring operation that attempts to mitigate the cash flow difficulties associated with catastrophic events. Individuals can accommodate small premium payments over time, but if large events come along that exceed the individual's income in a given time period (and available savings), individuals must either incur debt or go without. (Debt can be an efficient way to finance investments, but it becomes problematic when the volume of debt dwarfs the income needed to service it.)

Insurance is related to, but not the same thing as access to care. Access to care means that an individual can actually acquire and consume the health services that a required at any given time. By contracting with an insurance provider, an individual increases the set of available health services that can be financed at any given time. Whether or not the health services are available is a separate concern, and this distinction is critical to the ultimate performance of the ACA.

Health Care in the US

Perhaps the most well-known fact about health care in the US is that we spend a lot on it, and that amount grows each year. Consider the growth in health care related personal expenditures. Note that this captures only a portion of total health activity in the country.

In [2]:
'''Displaying vincent graphics within the Notebook requires that we first initialize this component of the package'''
vt.core.initialize_notebook()
<IPython.core.display.Javascript at 0x2781250>
In [3]:
'''To begin, we will pull just the personal consumption expenditures related to health, and plot them with vincent.  
Note that after we create the plot object, we will need to switch the data type of the x-axis to ordinal with the scales
method.  This is because we are dealing with temporal information, which vincent does not yet appear to accommodate
seamlessly when working with pandas DataFrames (DFs)'''

#Capture health expenditures
hth_exp=web.get_data_fred('DHLCRC1Q027SBEA','1/1/1970','12/1/2013')

#Create area object
hth_exp_area=vt.Area(hth_exp)

#Create axis labels
hth_exp_area.axis_titles(x='Quarter',y='Health Expenditures ($B)')

#Set number of tick marks
hth_exp_area.axes[0].ticks=10

hth_exp_area.display()
<IPython.core.display.Javascript at 0x3c38c50>

Clearly these expenditures are increasing over time. The relevant question is, however, does this matter? In general, looking at expenditures for health (or any social expenditure item) in a vacuum is of limited value. Our expenditure in health is over seven times as large now as it was 30 years ago. I guess that's passingly interesting, and it certainly mirrors the kind of information often tossed about in the news. But again, so what?

That information does not provide any operational knowledge. We spend more on a lot of things than we did 30 years ago. We cannot make policy base upon that information alone. The relevant question (or one of them anyway) centers on opportunity cost. If we are spending \(x\) on health care, how much do we give up in other areas? This is one of the few cases in which making a comparison to a household budget actually makes sense. If one spent 25% of the household budget on going out to eat last year, and then spent 30% of the budget on the same activity this year, everything else gets squeezed. Just as our household budget is set by the income we pull in, the relevant budget constraint for the country is GDP. Let us reconsider health expenditures as a portion of GDP.

In [4]:
'''We will perform a similar operation here as we did with the health expenditure data above.  We are only
inserting the newly calculated fraction as our series.'''

#Capture GDP
gdp=web.get_data_fred('GDP','1/1/1970','12/1/2013')

#Combine health expenditure and GDP information
hth_gdp=DataFrame(hth_exp).join(gdp)

#Rename columns
hth_gdp.columns=['hth','gdp']

#Calculate health expenditure as % of GDP
hth_gdp['hth_gdp']=hth_gdp['hth']/hth_gdp['gdp']

#Create area object
hth_gdp_area=vt.Area(hth_gdp['hth_gdp'])

#Create axis labels
hth_gdp_area.axis_titles(x='Quarter',y='Health Expenditures ($B)')

#Set number of tick marks
hth_gdp_area.axes[0].ticks=10

#Set color of chart
hth_gdp_area.colors(brew='RdBu')

hth_gdp_area.display()
<IPython.core.display.Javascript at 0x3c4ad90>

This chart is a bit more alarming. By measuring expenditures relative to income, we see that health expenditures are increasingly crowding out other investments. That being said, the relative flattening of the curve in the last five years has led many to speculate about whether or not the dire fiscal implications of the historical trend are still as likely to come to pass.

Placing US health expenditures in international context is more jarring still. The table below displays total health expenditure (not just personal consumption like above) as a percentage of GDP across OECD countries.

In [5]:
HTML('<iframe'+\
     ' src=http://www.keepeek.com/Digital-Asset-Management/embed-oecd/social-issues-migration-health/'+\
     'total-expenditure-on-health-2013-2_hlthxp-total-table-2013-2-en'+\
     ' width=1000 height=1200/>')
Out[5]:

Material Concerns About the ACA

Most of the media attention has focused on the problem of adverse selection and the budgetary impact of the ACA. We can address both briefly before moving on to conceptual critique offered by the American Enterprise Institute (a prominent conservative think tank that, in my opinion, far outstrips the Heritage Foundation as of late in terms of analytical rigor).

With respect to adverse selection, the concern is that not enough healthy individuals will sign up, instead opting to pay the penalty. Said differently, the penalty is not sufficient to get them to participate in the exchanges. When viewed from this perspective, it becomes clear that this is a calibration issue. It's engineering, not breakthrough research. If the penalty is not optimal, we can adjust it. This has little to do with the abstract concept of a penalty.

The budgetary impact is sometimes difficult to lockdown because wildly different numbers abound. The reason for this is that commentators and Congressional Members are comparing apples and oranges. To provide hard and fast concept names, Republicans tend to talk about the cost of Obamacare, while Democrats talk about the fiscal impact. Make no mistake, both parties confusingly use the words interchangeably. In a sense, they are interchangeable, just not as they are being used here. As we are using them here, cost refers to only the spending side impacts of the ACA, while fiscal impact takes into account revenues as well. These concepts are related in the following straightforward equation:

fiscal impact \(=\) revenues \(-\) cost

When CBO originally scored the ACA, they indicated a net positive fiscal increase (the ACA reduces the deficit over the budget window). While the estimates have fluctuated somewhat, these fluctuations have been minimal. Using the best available information, as a consequence of the revenue provisions embedded in the ACA, CBO still forecasts a net positive fiscal impact of

Conceptual Critique

Dr. Gottlieb of AEI sees the ACA as "a plan at war with itself." (Gottlieb, 2013) For example, as outlined in the ACA5 above, it depends on insurers to make investments while simultaneously limiting the non-service related operating margins available to said insurance providers. These inconsistencies may limit the effectiveness of the law. Gottlieb's real problem, however, lies in the push towards what are known as Accountable Care Organizations. In effect, these ACOs are integrated service delivery environments that seek to reduce the number of providers by absorbing smaller practices.

How does one get smaller practices to join large organizations? The ACA contains provisions that seek to limit reimbursement rates to providers by basically saying, you get a set amount of money to fix this person. If the costs runover, it's on you. This is in contrast to the existing model in which the insurance company is on the hook for all services provided (to the extent they cannot argue their way out of them). The resultant exposure can be too much to absorb for smaller practices. Thus, by shifting risk from insurers to providers, a strong incentive is created to join these ACOs, which have a stronger resource base.

From the perspective of the Obama Administration, this provides an opportunity for cost control by permitting organizational managers to act as budgetary stewards. Gottlieb argues that this is a recipe for fragmented care (due to a transition from family practitioners to "shift" doctors) with a strong incentive to underprovide services. Furthermore, he argues that a similar consolidation effort failed 20 years ago because the cost reductions never materialized. The same is likely to happen now because the downward pressure on cost appears to be uncoupled with clinical outcomes (among other things).

To his credit, Gottlieb acknowledges that the Obama Administration has considered the same issues he fears. They, however, believe that the earlier failures were a result of infrastructure failures that have since been upgraded.

What is the context in which the ACA operates?

Gottlieb (2013) argues that in addition to the theories outlined above, the ACA was and is motivated by a strong belief that the data suggest efficiencies are there for the taking. We started above with a very high-level view, so now it would be appropriate to explore the current state of insurance coverage and implementation of the ACA.

It is useful to start with a view of the uninsured population proportion by state. (Note that the distortions are recognized, particularly Michigan. However, for quick exploratory plots, these get the message across.)

In [7]:
'''This is the first instance of a geopandas plot in this script.  pandas DFs convert quite naturally to geopandas
GeoDataFrames (GDFs).  Note the following coding conventions:

party 
{0:'Republican',
 1:'Democrat'}
 
reelect
{0:'No',
 1:'Yes'}

exchange
{0:'State',
 1:'Partnership',
 2:'Federal'}
 
medicaid
{0:'No',
 1:'Leaning Toward Not Expanding',
 2:'Leaning Toward Expanding',
 3:'Yes'}

'''

#Set working directory
workdir='/home/choct155/dissertation/MiscProj/'

#Read in data
aca=pd.read_csv(workdir+'aca_state.csv').set_index('state')

#Read in shape file
states=gp.GeoDataFrame.from_file(workdir+'tl_2013_us_state.shp').set_index('STUSPS')

#Identify states (common elements of aca and states, which exclude territories and DC)
st_set=set(states.index) & set(aca.index)

#Drop Alaska and Hawaii
st_set_sub=[x for x in list(sorted(st_set)) if x not in ['AK','HI']]

#Join on states
st_aca=gp.GeoDataFrame(states.join(aca).ix[st_set_sub])
# HTML(st_aca[st_aca.columns[4:]].head().to_html())
In [8]:
'''We will first plot the uninsured population as a percentage of each state's total population.
Darker colors indicate higher proportions of uninsured'''

#Set plot size
plt.rcParams['figure.figsize']=18,12

#Create plotting object
fig,axes=plt.subplots(1)

#Plot uninsured population by state
st_aca.plot(column='unins',colormap='Reds',axes=axes)

#Get rid of chart junk (axes)
axes.set_axis_off()

#Set title
axes.set_title('Proportion of Uninsured by State (2013)',fontsize=22);

The map reveals some strong regional disparities in insurance coverage. Here are the ten states with the lowest proportions of uninsured residents...

In [9]:
HTML(DataFrame(st_aca['unins'].order()).head(10).to_html())
Out[9]:
unins
MA 0.04
MN 0.09
VT 0.09
CT 0.10
ME 0.10
WI 0.10
DE 0.11
IA 0.11
ND 0.11
NH 0.11

...and the ten states with the highest proportions.

In [10]:
HTML(DataFrame(st_aca['unins'].order()).tail(10).to_html())
Out[10]:
unins
WY 0.18
MS 0.19
CA 0.20
FL 0.20
GA 0.20
LA 0.20
SC 0.20
NM 0.21
NV 0.22
TX 0.24

There are a couple interesting things to note here. First, Massachussetts, the recognized test bed for the ACA, has the lowest proportion by far. The second is the distribution of uninsured individuals by party. Here is a map of the governor's party by state in 2013.

In [11]:
#Create plotting object
fig,axes=plt.subplots(1)

#Plot uninsured population by state
st_aca.plot(column='party',colormap='RdBu',axes=axes)

#Get rid of chart junk (axes)
axes.set_axis_off()

#Set title
axes.set_title('Party of Each Governor (2013)',fontsize=22);

Do we see differences in the uninsured across party lines?

In [12]:
#Set plot size
plt.rcParams['figure.figsize']=15,6

#Capture uninsured data by party
REP=st_aca['unins'][st_aca['party']==0].values
DEM=st_aca['unins'][st_aca['party']==1].values

#Generate plot object
fig,axes=plt.subplots(1)

#Plot notched boxplot
axes.boxplot([REP,DEM],vert=False,notch=True,bootstrap=1000,widths=[.02*len(REP),.02*len(DEM)])

#Overlay scatter plot of original data
axes.scatter(st_aca['unins'],st_aca['party'] + np.random.normal(1,.015,len(st_aca['party'])),alpha=.5,s=50)

#Calculate means by party
mean_unins=[REP.mean(),DEM.mean()]
axes.scatter(mean_unins,[1,2],c='r',s=150,alpha=.5,marker='D')

#Set labels
axes.set_yticklabels(['R','D'])
plt.xlabel('Uninsured Proportion of the Population')
plt.ylabel('Party of the Governor')
plt.title('Uninsured Populations by Party of the Governor');
In [13]:
'''We are just calculating a simple t-test of the means here'''
from scipy import stats

#Perform t-test
t=stats.ttest_ind(REP,DEM,equal_var=False)

#Share results
print 'THE p-VALUE OF A t-TEST OF THE MEANS IS ',t[1]
THE p-VALUE OF A t-TEST OF THE MEANS IS  0.0229664734025

To unpack the diagram a bit, we see boxplots of the uninsured data, split by the party of the Governor. The vertical red line indicates the median, while the red diamond indicates the mean for each group. The blue circles are the individual data points from the original information.

Interestingly enough, while the average uninsured population appears to be greater (with a high degree of confidence) in Republican states, the typical (a.k.a. median) cases are quite similar. One might reasonably ask why we are looking at this in the first place. The reason is to establish whether disparities in the target population impact implementation decisions.

Observe the distribution of implementation efforts by state. For the arrangement of state exchanges, there are three groups (sorry, the colors are a little backwards on this one):

For the decision to expand Medicaid, there are four groups:

To put this in context, Burke & Kamarck (2013) suggests that states that do not implement their own exchange and have opted against expanding Medicaid may be seen as obstructing the implementation of the law.

In [14]:
#Set plot size
plt.rcParams['figure.figsize']=18,20

#Create plotting object
fig,axes=plt.subplots(2)

#Plot uninsured population by state
st_aca.plot(column='exchange',colormap='RdBu',axes=axes[0])
st_aca.plot(column='medicaid',colormap='RdBu',axes=axes[1])

#Get rid of chart junk (axes)
axes[0].set_axis_off()
axes[1].set_axis_off()

#Set title
axes[0].set_title('Arrangement of State Exchange',fontsize=22)
axes[1].set_title('Decision to Expand Medicaid',fontsize=22);

What we notice here is a strong partisan divide. This divide is far from surprising, but boxplots can again help us see how wide the gap is with respect to these provisions.

In [15]:
#Set plot size
plt.rcParams['figure.figsize']=15,12

#Capture data by party
REP_ex=st_aca['exchange'][(st_aca['exchange'].notnull()) & (st_aca['party']==0)].values
DEM_ex=st_aca['exchange'][(st_aca['exchange'].notnull()) & (st_aca['party']==1)].values
REP_med=st_aca['medicaid'][(st_aca['medicaid'].notnull()) & (st_aca['party']==0)].values
DEM_med=st_aca['medicaid'][(st_aca['medicaid'].notnull()) & (st_aca['party']==1)].values

#Generate plot object
fig,axes=plt.subplots(2)

#Plot notched boxplot
axes[0].boxplot([REP_ex,DEM_ex],vert=False,notch=True,bootstrap=1000,widths=[.02*len(REP),.02*len(DEM)])
axes[1].boxplot([REP_med,DEM_med],vert=False,notch=True,bootstrap=1000,widths=[.02*len(REP),.02*len(DEM)])

#Overlay scatter plot of original data
axes[0].scatter(st_aca['exchange'],st_aca['party'] + np.random.normal(1,.025,len(st_aca['party'])),alpha=.5,s=50)
axes[1].scatter(st_aca['medicaid'],st_aca['party'] + np.random.normal(1,.025,len(st_aca['party'])),alpha=.5,s=50)

#Calculate means by party
mean_ex=[REP_ex.mean(),DEM_ex.mean()]
axes[0].scatter(mean_ex,[1,2],c='r',s=150,alpha=.5,marker='D')
mean_med=[REP_med.mean(),DEM_med.mean()]
axes[1].scatter(mean_med,[1,2],c='r',s=150,alpha=.5,marker='D')

#Set labels
axes[0].set_yticklabels(['R','D'])
axes[1].set_yticklabels(['R','D'])
# axes[0].xlabel('Uninsured Proportion of the Population')
axes[0].set_ylabel('Party of the Governor')
axes[1].set_ylabel('Party of the Governor')
axes[0].set_title('Arrangement of State Exchange')
axes[1].set_title('Decision to Expand Medicaid');

Unlike the comparison across parties for the uninsured population, we see stark differences for both the mean and median cases in implementation. Nearly all Republican states have elected not to set up their own exchange while most Democratic states have done so. Further, nearly all Democratic states have elected to expand Medicaid whil most Republican states either chose not to do so, or are leaning in that direction.

This is not the kind of split one expects to see if each decision were based simply on the empirical needs of each respective state. Again, this is not surprising, but it does throw into relief the nature of the debate, and the remarkable disconnect between the merits of a given policy and the likelihood that it will be pursued.

Can the ACA make a difference?

Contemporary efforts at health reform universally feature fiscal implications as a key (if not the primary) motivation. As pointed out above, we are justifiably concerned that health expenditures will eat into our ability to advance other policy goals. The ACA, somewhat paradoxically, actually attempts to increase the amount of health services that can be consumed for millions of people. It is not unreasonable to suspect that the reduction in expenditures that may materialize due to taxes on "cadillac" health plans will not fully offset the expenditure increase that is required to get currently uninsured individuals the statutorily defined minimum level of services required of ACA-type insurance plans. One might ask, if we want to lower health expenditures, why are we trying to by more health services?

This is a complex question with many moving parts. We will not go into detail here, but it suffices to say that the ACA assumes that we can lower the cost of health care provision. We can "bend the cost curve" as it were. The implicit assumption here is that inefficiencies exist that can be addressed with clever policy. There are at least three types of inefficiencies that the ACA targets.

  1. Emergency facility care is very expensive. The designers of the ACA believe that the demand for emergency care can be reduced by providing regular health access and preventative care. The idea here is that if given the choice, people will (on average) address health concerns before they require a trip to the emergency room.

  2. The distributed "fee for service" model is overly costly because doctors are incentivized to provide expensive, but unnecessary tests. Furthermore, return visits for patients are costly, and the current reimbursement scheme does not incentivize "getting it right the first time through."

  3. Lack of transparency and competition in the market create disparities that allow market prices to vary in a manner largely uncoupled from the actual production costs for health services.

I do not currently have data that would facilitate exploration of the first two considerations, but we can take a peek at Medicare Provider Charge Data to gain partial insight into the last one.

In [16]:
'''Here we can read in and briefly examine the data provided by CMS'''

#Establish data location
data_dir='/home/choct155/dissertation/MiscData/'

#Read in data
inpat=pd.read_csv(data_dir+'Medicare_Provider_Charge_Inpatient_DRG100_FY2011.csv')
outpat=pd.read_csv(data_dir+'Medicare_Provider_Charge_Outpatient_APC30_CY2011_v2.csv')

print inpat.head()
print outpat.head()
                                      DRG Definition  Provider Id  \
0           039 - EXTRACRANIAL PROCEDURES W/O CC/MCC        10001   
1  057 - DEGENERATIVE NERVOUS SYSTEM DISORDERS W/...        10001   
2  064 - INTRACRANIAL HEMORRHAGE OR CEREBRAL INFA...        10001   
3  065 - INTRACRANIAL HEMORRHAGE OR CEREBRAL INFA...        10001   
4  066 - INTRACRANIAL HEMORRHAGE OR CEREBRAL INFA...        10001   

                      Provider Name Provider Street Address Provider City Provider State  \
0  SOUTHEAST ALABAMA MEDICAL CENTER  1108 ROSS CLARK CIRCLE        DOTHAN             AL   
1  SOUTHEAST ALABAMA MEDICAL CENTER  1108 ROSS CLARK CIRCLE        DOTHAN             AL   
2  SOUTHEAST ALABAMA MEDICAL CENTER  1108 ROSS CLARK CIRCLE        DOTHAN             AL   
3  SOUTHEAST ALABAMA MEDICAL CENTER  1108 ROSS CLARK CIRCLE        DOTHAN             AL   
4  SOUTHEAST ALABAMA MEDICAL CENTER  1108 ROSS CLARK CIRCLE        DOTHAN             AL   

   Provider Zip Code Hospital Referral Region Description   Total Discharges   \
0              36301                          AL - Dothan                  91   
1              36301                          AL - Dothan                  38   
2              36301                          AL - Dothan                  84   
3              36301                          AL - Dothan                 169   
4              36301                          AL - Dothan                  33   

    Average Covered Charges    Average Total Payments   
0                32963.07692               5777.241758  
1                20312.78947               4894.763158  
2                38820.39286              10260.214290  
3                27345.10059               6542.088757  
4                17605.51515               4596.393939  
                                        APC  Provider Id  \
0  0012 - Level I Debridement & Destruction        10029   
1  0012 - Level I Debridement & Destruction        20024   
2  0012 - Level I Debridement & Destruction        30064   
3  0012 - Level I Debridement & Destruction        30088   
4  0012 - Level I Debridement & Destruction        30111   

                                       Provider Name     Provider Street Address Provider City  \
0                EAST ALABAMA MEDICAL CENTER AND SNF      2000 PEPPERELL PARKWAY       OPELIKA   
1                 CENTRAL PENINSULA GENERAL HOSPITAL          250 HOSPITAL PLACE      SOLDOTNA   
2   UNIVERSITY OF ARIZONA MEDICAL CTR-UNIVERSIT, THE  1501 NORTH CAMPBELL AVENUE        TUCSON   
3                      BANNER BAYWOOD MEDICAL CENTER    6644 EAST BAYWOOD AVENUE          MESA   
4  UNIVERSITY OF ARIZONA MEDICAL CTR- SOUTH CAM, THE           2800 EAST AJO WAY        TUCSON   

  Provider State  Provider Zip Code Hospital Referral Region (HRR) Description  \
0             AL              36801                            AL - Birmingham   
1             AK              99669                             AK - Anchorage   
2             AZ              85724                                AZ - Tucson   
3             AZ              85206                                  AZ - Mesa   
4             AZ              85713                                AZ - Tucson   

   Outpatient Services  Average  Estimated Submitted Charges  Average Total Payments  
0                   23                             78.086957               21.910435  
1                  994                            149.589749               36.623853  
2                 1765                             50.135411               14.541841  
3                   20                            112.400000               23.736000  
4                   22                            152.045455               16.569091  

In the interests of full disclosure, there are a couple items to note:

  1. This is a reasonably rich data set. We will only briefly consider some exploratory views of the data, but it would not be reasonable to assume a high degree of inferential confidence without stricter construction of controls. In other words, the view we see here would be suggestive of what is actually going on, but should not be taken as gospel. This is meant to be just a taste, and not an exhaustive review of even the range of descriptive views of the data.

  2. Gottlieb (2013) appears to have external validity concerns with this data, which is not altogether unreasonable from what I can tell. However, we often work with limited data, and as far as I know this is the best publicly available data of its kind. I am more than happy to accommodate different sources if they should arise. In general, cautiously used data with limitations is far superior than no data at all.

So, what are we looking for here? In a nutshell, we are looking for variation in the cost of the same procedures. If competition and transparency are deficient in the health services market, we would not expect to see convergence to common prices. In reality, some variation will occur for reasons other than the basic cost of service provision. For example, we will see differences across geographic space due to real estate differentials that in turn create variation in overhead costs.

In any case, a good place to start would be to provide minimal information on the data. There are two separate datasets, one for inpatient charges, and one for outpatient charges. Here are the available variables.

In [17]:
print '***INPATIENT***\n',inpat
print '\n***OUTPATIENT\n',outpat
***INPATIENT***
<class 'pandas.core.frame.DataFrame'>
Int64Index: 163065 entries, 0 to 163064
Data columns (total 11 columns):
DRG Definition                          163065  non-null values
Provider Id                             163065  non-null values
Provider Name                           163065  non-null values
Provider Street Address                 163065  non-null values
Provider City                           163065  non-null values
Provider State                          163065  non-null values
Provider Zip Code                       163065  non-null values
Hospital Referral Region Description    163065  non-null values
 Total Discharges                       163065  non-null values
 Average Covered Charges                163065  non-null values
 Average Total Payments                 163065  non-null values
dtypes: float64(2), int64(3), object(6)

***OUTPATIENT
<class 'pandas.core.frame.DataFrame'>
Int64Index: 43372 entries, 0 to 43371
Data columns (total 11 columns):
APC                                           43372  non-null values
Provider Id                                   43372  non-null values
Provider Name                                 43372  non-null values
Provider Street Address                       43372  non-null values
Provider City                                 43372  non-null values
Provider State                                43372  non-null values
Provider Zip Code                             43372  non-null values
Hospital Referral Region (HRR) Description    43372  non-null values
Outpatient Services                           43372  non-null values
Average  Estimated Submitted Charges          43372  non-null values
Average Total Payments                        43372  non-null values
dtypes: float64(2), int64(3), object(6)

As can be seen, there are 163,065 records for the inpatient set and 43,372 records for the outpatient set. It appears that each record couples a given procedure with a provider. Each procedure/provider combination is coupled with average charge information and some attribute data about the provider among other things.

How many procedures are represented in each set?

In [18]:
print 'THERE ARE ',len(set(inpat['DRG Definition'])),' PROCEDURES IN THE INPATIENT SET'
print 'THERE ARE ',len(set(outpat['APC'])),' PROCEDURES IN THE OUTPATIENT SET'
THERE ARE  100  PROCEDURES IN THE INPATIENT SET
THERE ARE  30  PROCEDURES IN THE OUTPATIENT SET

How many providers are represented?

In [19]:
print 'THERE ARE ',len(set(inpat['Provider Id'])),' PROVIDERS IN THE INPATIENT SET'
print 'THERE ARE ',len(set(outpat['Provider Id'])),' PROVIDERS IN THE OUTPATIENT SET'
THERE ARE  3337  PROVIDERS IN THE INPATIENT SET
THERE ARE  3135  PROVIDERS IN THE OUTPATIENT SET

A few thousand providers will give us a nice distributional view of charges. We should be careful to consider the spatial distribution of these records (procedure/provider combinations). Are we heavily weighted in one region of the country versus another? The data all come with a zip code attribute, so we can use a Census shapefile that has better resolution than states alone.

In [20]:
'''The zip code shapefile is large, so we will load the data in a standalone cell'''
zip_shp=gp.GeoDataFrame.from_file(data_dir+'tl_2013_us_zcta510.shp')
In [21]:
'''We now need to join the Medicare data to the zip code shapefile.  We will effectively be creating a spatial 
histogram, so we first need to group to zip codes, and extract the count.  This summary DF will then be joined to
the zip code shapefile.'''

#Create new integer version of zip codes in zip GDF
zip_shp['zip_int']=[int(x) for x in zip_shp['ZCTA5CE10']]

#Set the index
zip_shp2=zip_shp.set_index('zip_int')

#Define mask to exclude Alaska and Hawaii
in_ak_hi_mask=-inpat['Provider State'].isin(['AK','HI'])
out_ak_hi_mask=-outpat['Provider State'].isin(['AK','HI'])

#Groupby zip code and count
in_zip=inpat[in_ak_hi_mask].groupby('Provider Zip Code').count()
out_zip=outpat[out_ak_hi_mask].groupby('Provider Zip Code').count()

#Join Medicare data
in_shp=gp.GeoDataFrame(zip_shp2.join(in_zip))
out_shp=gp.GeoDataFrame(zip_shp2.join(out_zip))

#Report proportion lost in the join
in_zip_loss=len(set(in_zip.index)-set(zip_shp2.index))/float(len(set(in_zip.index)))
out_zip_loss=len(set(out_zip.index)-set(zip_shp2.index))/float(len(set(out_zip.index)))
print 'THE INPATIENT JOIN LOST THE FOLLOWING PROPORTION OF ZIP CODES:', in_zip_loss
print 'THE OUTPATIENT JOIN LOST THE FOLLOWING PROPORTION OF ZIP CODES:', out_zip_loss
THE INPATIENT JOIN LOST THE FOLLOWING PROPORTION OF ZIP CODES: 0.0527530497857
THE OUTPATIENT JOIN LOST THE FOLLOWING PROPORTION OF ZIP CODES: 0.0541666666667

In [35]:
'''We can now plot the zip code level inpatient data...'''
#Set plot size
plt.rcParams['figure.figsize']=20,12

#Set line thickness
plt.rcParams['lines.linewidth']=.001

#Create plotting object
fig,axes=plt.subplots(1)

#Plot inpatient record coverage by zip code
in_shp[in_shp['Provider Id'].notnull()].plot(column='Provider Id',colormap='RdBu',axes=axes)

#Get rid of chart junk (axes)
axes.set_axis_off()

#Set title
axes.set_title('Inpatient Record Coverage',fontsize=22);
In [36]:
'''...followed by the zip code level outpatient data'''
#Set plot size
plt.rcParams['figure.figsize']=18,11

#Set line thickness
plt.rcParams['lines.linewidth']=.001

#Create plotting object
fig,axes=plt.subplots(1)

#Plot outpatient record coverage by zip code
out_shp[out_shp['Provider Id'].notnull()].plot(column='Provider Id',colormap='RdBu',axes=axes)

#Get rid of chart junk (axes)
axes.set_axis_off()

#Set title
axes.set_title('Outpatient Record Coverage',fontsize=22);

Hexbin maps probably would be a bit cleaner for showing this coverage, but the labor-return ratio just isn't quite there for this purpose. In any case, it appears the zip code distribution is quite similar (if not identical) across the inpatient and outpatient sources. There also appears to be a strong eastward geographic bias in the sample. This isn't the end of the world, but certainly something to keep in mind.

It would be nice to get a clearer view of the distribution of records per zip code. We filtered Alaska and Hawaii out of the histogram maps above because their distances from "lower 48" had visual proportional implications that limited the usefulness of the maps. In other words, the "lower 48" became too small to see much. We do not need to filter these states out when viewing the data non-spatially.

In [45]:
'''We are going to recreate the groupby objects (counting records within zip codes) without filtering Alaska
and Hawaii. Then we can view the distributional shape of records per zip code.'''

#Groupby zip code and count
in_zip2=inpat.groupby('Provider Zip Code').count()
out_zip2=outpat.groupby('Provider Zip Code').count()

#Set plot size
plt.rcParams['figure.figsize']=15,8

#Set color palette
c1,c2,c3,c4,c5=seaborn.color_palette('Set1',5)

#Generate plot object
fig,axes=plt.subplots(1)

#Plot kernel density of input and outpatient data
seaborn.kdeplot(in_zip2['Provider Id'].astype(float).values,shade=True,color=c4,ax=axes,label='Inpatient')
seaborn.kdeplot(out_zip2['Provider Id'].astype(float).values,shade=True,color=c5,ax=axes,label='Outpatient')

#Set title
axes.set_title('Distribution of Records per Zip Code',fontsize=22)

#Set facecolor
axes.patch.set_facecolor('white')

While it is not surprising that we should have many more zip codes in the inpatient with many records (there are more inpatient records to begin with), it is interesting that the inpatient distribution is so much flatter. It suggests a much heavier influence of dominant zip codes in the inpatient data. We could explore these properties a bit further, but for the purpose of this Notebook, we have a sense of things to consider when evaluating the distribution of charges.

Variation in Charges

As noted above, there are 100 procedures in the inpatient set and 30 procedures in the outpatient set. We will not explore all of them here, but if specific inquiries are made, we could conceivably look into them further. In the interests of time, we will focus on four cases:

  1. Maximum number of instances
  2. 75% percentile
  3. 50% percentile
  4. 25% percentile

The idea is to capture variation in categorically different locations in the frequency distribution. Hopefully this will give a more representative picture than just focusing on the most common procedure. The first step is identifying the procedures that fit in each of these buckets.

In [65]:
'''To identify the procedures at each of these distributional positions, we need value counts for all procedures.
We can then rely on the pandas built-in describe function to provide the relevant counts, and select the procedures 
associated with said counts. Note that the values returned by describe may not correspond with actual values in
the value counts.  When this occurs, we just use the closest value.'''

#Count instances of each procedure
proc_counts=inpat['DRG Definition'].value_counts()

#Identify counts at each distributional position
max_c=proc_counts.describe()['max']
x75=proc_counts.describe()['75%']
x50=proc_counts.describe()['50%']
x25=proc_counts.describe()['25%']

#Define function to find the closest actual value in proc_counts
def find_nearest(seq,val):
    idx=(np.abs(seq-val)).argmin()
    return seq[idx]

#Identify associated procedure
proc_max=proc_counts[proc_counts==find_nearest(proc_counts.values,max_c)].index[0]
proc_75=proc_counts[proc_counts==find_nearest(proc_counts.values,x75)].index[0]
proc_50=proc_counts[proc_counts==find_nearest(proc_counts.values,x50)].index[0]
proc_25=proc_counts[proc_counts==find_nearest(proc_counts.values,x25)].index[0]

print '***PROCEDURES AT PRESCRIBED DISTRIBUTIONAL POSITIONS***\n'
print 'MOST COMMON:',proc_max
print '75%:',proc_75
print '50%:',proc_50
print '25%:',proc_25
***PROCEDURES AT PRESCRIBED DISTRIBUTIONAL POSITIONS***

MOST COMMON: 194 - SIMPLE PNEUMONIA & PLEURISY W CC
75%: 189 - PULMONARY EDEMA & RESPIRATORY FAILURE
50%: 300 - PERIPHERAL VASCULAR DISORDERS W CC
25%: 684 - RENAL FAILURE W/O CC/MCC

Now we can subset by each of these procedures and check out the variation in charges for each.

In [68]:
'''We are just subsetting the inpatient data by procedure here'''

#Subset by each of the procedures
in_max=inpat[inpat['DRG Definition']==proc_max]
in_75=inpat[inpat['DRG Definition']==proc_75]
in_50=inpat[inpat['DRG Definition']==proc_50]
in_25=inpat[inpat['DRG Definition']==proc_25]

in_max.head()
Out[68]:
DRG Definition Provider Id Provider Name Provider Street Address Provider City Provider State Provider Zip Code Hospital Referral Region Description Total Discharges Average Covered Charges Average Total Payments
16 194 - SIMPLE PNEUMONIA & PLEURISY W CC 10001 SOUTHEAST ALABAMA MEDICAL CENTER 1108 ROSS CLARK CIRCLE DOTHAN AL 36301 AL - Dothan 107 21096.94393 5832.738318
104 194 - SIMPLE PNEUMONIA & PLEURISY W CC 10005 MARSHALL MEDICAL CENTER SOUTH 2505 U S HIGHWAY 431 NORTH BOAZ AL 35957 AL - Birmingham 66 14732.00000 6131.984848
155 194 - SIMPLE PNEUMONIA & PLEURISY W CC 10006 ELIZA COFFEE MEMORIAL HOSPITAL 205 MARENGO STREET FLORENCE AL 35631 AL - Birmingham 112 27976.01786 5688.758929
231 194 - SIMPLE PNEUMONIA & PLEURISY W CC 10007 MIZELL MEMORIAL HOSPITAL 702 N MAIN ST OPP AL 36467 AL - Dothan 58 15039.50000 5182.275862
250 194 - SIMPLE PNEUMONIA & PLEURISY W CC 10008 CRENSHAW COMMUNITY HOSPITAL 101 HOSPITAL CIRCLE LUVERNE AL 36049 AL - Montgomery 12 29104.83333 7576.916667

There are three important data definitions that can be found here, but will be reiterated anyway:

  1. Total Discharges = The number of discharges billed by the provider for inpatient hospital services.
  2. Average Covered Charges = The provider's average charge for services covered by Medicare for all discharges in the DRG. These will vary from hospital to hospital because of differences in hospital charge structures.
  3. Average Total Payments = The average of Medicare payments to the provider for the DRG including the DRG amount, teaching, disproportionate share, capital, and outlier payments for all cases. Also included are co-payment and deductible amounts for which the patient is responsible.

We will ask three basic questions, each of which could be explored much farther than will occur here.

  1. What is the overall distribution of covered charges and payments?
  2. Are there large variations in covered charges and payments across states?
  3. Are there large variations in covered charges and payments within states?

The first is simple enough to capture with kernel density plots.

In [74]:
'''We are just making more density plots as we did above'''
#Set plot size
plt.rcParams['figure.figsize']=15,16

#Generate plot object
fig,axes=plt.subplots(2)

#Plot kernel density of input and outpatient data
seaborn.kdeplot(in_max[' Average Covered Charges '].astype(float).values,shade=True,color=c1,ax=axes[0],label='Max')
seaborn.kdeplot(in_75[' Average Covered Charges '].astype(float).values,shade=True,color=c2,ax=axes[0],label='75%')
seaborn.kdeplot(in_50[' Average Covered Charges '].astype(float).values,shade=True,color=c3,ax=axes[0],label='50%')
seaborn.kdeplot(in_25[' Average Covered Charges '].astype(float).values,shade=True,color=c4,ax=axes[0],label='25%')

seaborn.kdeplot(in_max[' Average Total Payments '].astype(float).values,shade=True,color=c1,ax=axes[1],label='Max')
seaborn.kdeplot(in_75[' Average Total Payments '].astype(float).values,shade=True,color=c2,ax=axes[1],label='75%')
seaborn.kdeplot(in_50[' Average Total Payments '].astype(float).values,shade=True,color=c3,ax=axes[1],label='50%')
seaborn.kdeplot(in_25[' Average Total Payments '].astype(float).values,shade=True,color=c4,ax=axes[1],label='25%')


#Set title
axes[0].set_title('Distribution of Average Covered Charges by Procedure',fontsize=22)
axes[1].set_title('Distribution of Average Payments by Procedure',fontsize=22)

#Set facecolor
axes[0].patch.set_facecolor('white')
axes[1].patch.set_facecolor('white')

So why are we looking at the total distribution of charges and payments? It gives us a baseline from which to judge variation across and within states. We also have a sense of which procedures may have more inherent variation. The interesting thing to note in the plots above is that variation does not appear to be driven entirely by the frequency of the procedure. We would expect that with increasing frequency the procedure would become more standardized with respect to protocol and price. In fact, the least frequent procedure in the group has the tightest distribution. In other words, it varies in cost the least. Further, it is interesting to note that the most variation comes from the procedure in the 75 percentile position.

This is clearly a high-level observation and would need further exploration to verify, but it is quite suggestive. What about variation in average charges across states? To evaluate this, we need a measure of the average charges and payments by state. The charge and payment data are pegged to the discharges, so we can use these as weights in the construction of a weighted average for each state.

In [80]:
'''To construct the weighted averages of charges and payments by state, we will utilize the trusty 'split-apply-combine' 
approach.  (See Hadley Wickham's plyr package if you are unfamiliar:  
    http://www.r-bloggers.com/a-fast-intro-to-plyr-for-r/ )
We will split each DF by state, calculate the weighted sums using discharges as weights for both payments and charges,
recombine the states into a single DF, and join it with our state shapefile.'''

def wt_avg(df):
    #Generate list of states
    st_list=sorted(set(df['Provider State']))
    #Generate lists for average data
    charge_list=[]
    pay_list=[]
    #For each state in the procedure specific subset...
    for st in st_list:
        #...subset by the state...
        st_sub=in_max[in_max['Provider State']==st]
        #...calculate total charges and payments...
        st_sub['tot_charge']=st_sub[' Total Discharges ']*st_sub[' Average Covered Charges ']
        st_sub['tot_pay']=st_sub[' Total Discharges ']*st_sub[' Average Total Payments ']
        #...divide total charges and payments by total discharges...
        avg_charge=st_sub['tot_charge'].sum()/st_sub[' Total Discharges '].sum()
        avg_pay=st_sub['tot_pay'].sum()/st_sub[' Total Discharges '].sum()
        #...and throw the resultant values in a list
        charge_list.append(avg_charge)
        pay_list.append(avg_pay)
    #Construct a DF from the output lists   
    avg_df=DataFrame({'state':st_list,
                      'charges':charge_list,
                      'payments':pay_list}).set_index('state')
    return avg_df

#Join each summary DF with state shapefile 
imax_geo=gp.GeoDataFrame(states.join(wt_avg(in_max)))
i75_geo=gp.GeoDataFrame(states.join(wt_avg(in_75)))
i50_geo=gp.GeoDataFrame(states.join(wt_avg(in_50)))
i25_geo=gp.GeoDataFrame(states.join(wt_avg(in_25)))

We need to see this data in both spatial and non-spatial formats. First the charges...

In [87]:
#Set plot size
plt.rcParams['figure.figsize']=18,54

#Create plotting object
fig,axes=plt.subplots(5)

#Plot covered charges by state
gp.GeoDataFrame(imax_geo.ix[st_set_sub]).plot(column='charges',colormap='Reds',axes=axes[0])
gp.GeoDataFrame(i75_geo.ix[st_set_sub]).plot(column='charges',colormap='Blues',axes=axes[1])
gp.GeoDataFrame(i50_geo.ix[st_set_sub]).plot(column='charges',colormap='Greens',axes=axes[2])
gp.GeoDataFrame(i25_geo.ix[st_set_sub]).plot(column='charges',colormap='Purples',axes=axes[3])

seaborn.kdeplot(imax_geo['charges'].astype(float).values,shade=True,color=c1,ax=axes[4],label='Max')
seaborn.kdeplot(i75_geo['charges'].astype(float).values,shade=True,color=c2,ax=axes[4],label='75%')
seaborn.kdeplot(i50_geo['charges'].astype(float).values,shade=True,color=c3,ax=axes[4],label='50%')
seaborn.kdeplot(i25_geo['charges'].astype(float).values,shade=True,color=c4,ax=axes[4],label='25%')

#Get rid of chart junk (axes)
axes[0].set_axis_off()
axes[1].set_axis_off()
axes[2].set_axis_off()
axes[3].set_axis_off()

#Set title
axes[0].set_title('Charges by State - Most Common Procedure',fontsize=22)
axes[1].set_title('Charges by State - 75th Percentile Procedure',fontsize=22)
axes[2].set_title('Charges by State - 50th Percentile Procedure',fontsize=22)
axes[3].set_title('Charges by State - 25th Percentile Procedure',fontsize=22)
axes[4].set_title('Distribution of Charges by State',fontsize=22)

#Set facecolor
axes[4].patch.set_facecolor('white')

...and now the payments data.

In [88]:
#Set plot size
plt.rcParams['figure.figsize']=18,54

#Create plotting object
fig,axes=plt.subplots(5)

#Plot covered charges by state
gp.GeoDataFrame(imax_geo.ix[st_set_sub]).plot(column='payments',colormap='Reds',axes=axes[0])
gp.GeoDataFrame(i75_geo.ix[st_set_sub]).plot(column='payments',colormap='Blues',axes=axes[1])
gp.GeoDataFrame(i50_geo.ix[st_set_sub]).plot(column='payments',colormap='Greens',axes=axes[2])
gp.GeoDataFrame(i25_geo.ix[st_set_sub]).plot(column='payments',colormap='Purples',axes=axes[3])

seaborn.kdeplot(imax_geo['payments'].astype(float).values,shade=True,color=c1,ax=axes[4],label='Max')
seaborn.kdeplot(i75_geo['payments'].astype(float).values,shade=True,color=c2,ax=axes[4],label='75%')
seaborn.kdeplot(i50_geo['payments'].astype(float).values,shade=True,color=c3,ax=axes[4],label='50%')
seaborn.kdeplot(i25_geo['payments'].astype(float).values,shade=True,color=c4,ax=axes[4],label='25%')

#Get rid of chart junk (axes)
axes[0].set_axis_off()
axes[1].set_axis_off()
axes[2].set_axis_off()
axes[3].set_axis_off()

#Set title
axes[0].set_title('Payments by State - Most Common Procedure',fontsize=22)
axes[1].set_title('Payments by State - 75th Percentile Procedure',fontsize=22)
axes[2].set_title('Payments by State - 50th Percentile Procedure',fontsize=22)
axes[3].set_title('Payments by State - 25th Percentile Procedure',fontsize=22)
axes[4].set_title('Distribution of Payments by State',fontsize=22)

#Set facecolor
axes[4].patch.set_facecolor('white')

Wow. There is considerable variation across states, but almost no variation across procedures. This suggests that 1) there are strong state-specific factors heavily influencing the payment structure in each state (perhaps an expected finding), and 2) (again) frequency appears to play an insignificant role in the distribution of pricing. Both of these findings are consistent with the idea that substantial barriers to efficient market clearing exist.

If these barriers do exist, the ACA focus on provider transparency and exchange-based competition could bear real fruit. It would also suggest that the Republican proposal to enable the purchase of insurance across state borders could also be quite productive.

What about variation within states? First the charge data...

In [148]:
'''We will use a simple boxplot to capture differences by state.  Coloring by party of the Governor would also be
useful to capture any partisan differences that may exist.  To do this, we will join in party information, map party
colors, and plot the results.

Note that we also care about the order of the states.  If they are arranged by average value, the information is
easier to absorb, and we may better pick up any trends that exist.
'''

#Join party information in with cost data
inpat_col=inpat.set_index('Provider State').join(aca['party'])

#Capture order of states (by mean charge)
order1=list(inpat_col[' Average Covered Charges '].groupby(level=0).mean().order().index)

#Map colors to each state
color_map={0:'#DE2D26',
           1:'#3182BD'}
inpat_col['color']=inpat_col['party'].map(color_map)

#Capture a smaller list with one record for each state (and the associated color)
col_by_state1=inpat_col.groupby(level=0).last()['color'].fillna('#CCCCCC').ix[order1]

#Generate a Seaborn color palette
party_colors1=seaborn.color_palette(list(col_by_state1.values),len(col_by_state1))

#Set plot size
plt.rcParams['figure.figsize']=18,25

#Generate plot object
fig,axes=plt.subplots(1)

#Plot variation in total charges by state
seaborn.boxplot(inpat_col[' Average Covered Charges '],inpat_col.index,color=party_colors1,order=order1,vert=False,ax=axes)

#Set title
axes.set_title('Variation in Charges by State',fontsize=22)

#Set facecolor
axes.patch.set_facecolor('white')

...and then the payments.

In [149]:
#Capture order of states (by mean charge)
order2=list(inpat_col[' Average Total Payments '].groupby(level=0).mean().order().index)

#Capture a smaller list with one record for each state (and the associated color)
col_by_state2=inpat_col.groupby(level=0).last()['color'].fillna('#CCCCCC').ix[order2]

#Generate a Seaborn color palette
party_colors2=seaborn.color_palette(list(col_by_state2.values),len(col_by_state2))

#Set plot size
plt.rcParams['figure.figsize']=18,25

#Generate plot object
fig,axes=plt.subplots(1)

#Plot variation in total charges by state
seaborn.boxplot(inpat_col[' Average Total Payments '],inpat_col.index,color=party_colors2,order=order2,vert=False,ax=axes)

#Set title
axes.set_title('Variation in Payments by State',fontsize=22)

#Set facecolor
axes.patch.set_facecolor('white')

These high-level views provide a sense of the baseline variation in each state. As can be seen, there are significant differences across states. Correlations with party appear to be more meaningful with variation in payments versus variation in charges. For some reason, Democratic states appear to have higher variance in their payment structure.

What we really want to see, however, is variance within procedure. We can use the coefficient of variation as a nice summary measure. As a consequence of normalizing variation by the mean, it facilitates comparisons across groups.

In [150]:
'''We will again rely on split-apply-combine to find the CoV within each state.  We will split the data by state, 
divide the charge and payment standard deviations by their respective means, and recombine the data for plotting.'''

def CoV(df):
    #Generate list of states
    st_list=sorted(set(df['Provider State']))
    #Generate lists for average data
    charge_list=[]
    pay_list=[]
    #For each state in the procedure specific subset...
    for st in st_list:
        #...subset by the state...
        st_sub=in_max[in_max['Provider State']==st]
        #...calculate total charge and payment means...
        charge_mean=st_sub[' Average Covered Charges '].mean()
        pay_mean=st_sub[' Average Total Payments '].mean()
        #...calculate total charge and payment standard deviations...
        charge_std=st_sub[' Average Covered Charges '].std()
        pay_std=st_sub[' Average Total Payments '].std()
        #...divide charge and payment standard deviations by their means...
        cov_charge=charge_std/charge_mean
        cov_pay=pay_mean/pay_std
        #...and throw the resultant values in a list
        charge_list.append(cov_charge)
        pay_list.append(cov_pay)
    #Construct a DF from the output lists   
    cov_df=DataFrame({'state':st_list,
                      'charges':charge_list,
                      'payments':pay_list}).set_index('state')
    return cov_df

#Join each summary DF with state shapefile 
imax_cov_geo=gp.GeoDataFrame(states.join(CoV(in_max)))
i75_cov_geo=gp.GeoDataFrame(states.join(CoV(in_75)))
i50_cov_geo=gp.GeoDataFrame(states.join(CoV(in_50)))
i25_cov_geo=gp.GeoDataFrame(states.join(CoV(in_25)))

Now we can once again plot the spatial and non-spatial distributions.

In [152]:
#Set plot size
plt.rcParams['figure.figsize']=18,54

#Create plotting object
fig,axes=plt.subplots(5)

#Plot covered charges by state
gp.GeoDataFrame(imax_cov_geo.ix[st_set_sub]).plot(column='charges',colormap='Reds',axes=axes[0])
gp.GeoDataFrame(i75_cov_geo.ix[st_set_sub]).plot(column='charges',colormap='Blues',axes=axes[1])
gp.GeoDataFrame(i50_cov_geo.ix[st_set_sub]).plot(column='charges',colormap='Greens',axes=axes[2])
gp.GeoDataFrame(i25_cov_geo.ix[st_set_sub]).plot(column='charges',colormap='Purples',axes=axes[3])

seaborn.kdeplot(imax_cov_geo['charges'].astype(float).values,shade=True,color=c1,ax=axes[4],label='Max')
seaborn.kdeplot(i75_cov_geo['charges'].astype(float).values,shade=True,color=c2,ax=axes[4],label='75%')
seaborn.kdeplot(i50_cov_geo['charges'].astype(float).values,shade=True,color=c3,ax=axes[4],label='50%')
seaborn.kdeplot(i25_cov_geo['charges'].astype(float).values,shade=True,color=c4,ax=axes[4],label='25%')

#Get rid of chart junk (axes)
axes[0].set_axis_off()
axes[1].set_axis_off()
axes[2].set_axis_off()
axes[3].set_axis_off()

#Set title
axes[0].set_title('Variation in Charges by State - Most Common Procedure',fontsize=22)
axes[1].set_title('Variation in Charges by State - 75th Percentile Procedure',fontsize=22)
axes[2].set_title('Variation in Charges by State - 50th Percentile Procedure',fontsize=22)
axes[3].set_title('Variation in Charges by State - 25th Percentile Procedure',fontsize=22)
axes[4].set_title('Distribution of Charge CoV by State',fontsize=22)

#Set facecolor
axes[4].patch.set_facecolor('white')
In [153]:
#Set plot size
plt.rcParams['figure.figsize']=18,54

#Create plotting object
fig,axes=plt.subplots(5)

#Plot covered charges by state
gp.GeoDataFrame(imax_cov_geo.ix[st_set_sub]).plot(column='payments',colormap='Reds',axes=axes[0])
gp.GeoDataFrame(i75_cov_geo.ix[st_set_sub]).plot(column='payments',colormap='Blues',axes=axes[1])
gp.GeoDataFrame(i50_cov_geo.ix[st_set_sub]).plot(column='payments',colormap='Greens',axes=axes[2])
gp.GeoDataFrame(i25_cov_geo.ix[st_set_sub]).plot(column='payments',colormap='Purples',axes=axes[3])

seaborn.kdeplot(imax_cov_geo['payments'].astype(float).values,shade=True,color=c1,ax=axes[4],label='Max')
seaborn.kdeplot(i75_cov_geo['payments'].astype(float).values,shade=True,color=c2,ax=axes[4],label='75%')
seaborn.kdeplot(i50_cov_geo['payments'].astype(float).values,shade=True,color=c3,ax=axes[4],label='50%')
seaborn.kdeplot(i25_cov_geo['payments'].astype(float).values,shade=True,color=c4,ax=axes[4],label='25%')

#Get rid of chart junk (axes)
axes[0].set_axis_off()
axes[1].set_axis_off()
axes[2].set_axis_off()
axes[3].set_axis_off()

#Set title
axes[0].set_title('Variation in Payments by State - Most Common Procedure',fontsize=22)
axes[1].set_title('Variation in Payments by State - 75th Percentile Procedure',fontsize=22)
axes[2].set_title('Variation in Payments by State - 50th Percentile Procedure',fontsize=22)
axes[3].set_title('Variation in Payments by State - 25th Percentile Procedure',fontsize=22)
axes[4].set_title('Distribution of Payment CoV by State',fontsize=22)

#Set facecolor
axes[4].patch.set_facecolor('white')

We see very similar patterns to the variation across states in the sense that frequency of provision for a given procedure does not seem to be a factor. One the other hand, there are large variations in the consistency of charges and payment within state. This is only modestly true for charges, but payments exhibit remarkable variation in this regard.

Concluding Thoughts

The ACA is predicated on the idea that there are market inefficiencies that can be addressed that would increase the productivity of health expenditure. This has to happen to make expanding health care to more citizens a feasible prospect. The law seeks to do this by providing incentives that push health care providers into large integrated environments called Accountable Care Organizations. These incentives are passed largely through the tax codes, in a manner consistent with our country's pursuit of many unrelated policy goals.

There does not appear to be any a priori reason why this cannot yield some benefits. Gottlieb's (2013) concerns about historical efforts along similar veins may be somewhat tempered by updated infrastructure in the health services and related industries. Further, the experience of Massachusetts suggests that this type of reform has the potential to achieve stated policy goals (expanded insurance coverage, and greater utilization of health resources) while not imploding the private market for insurance (Gruber 2011).

All that being said, the ACA has a much more complex task than that which was undertaken in Massachusetts because it must coordinate the disparate interests and levels of implementation effort of the various states. Indeed, some states have vowed to block the ACA even after the Supreme Court put to rest any questions of constitutionality. Getting state systems to talk to one another is challenging even in the best of times.

The potential for benefit from a reshaping of the health market appears to exist. Data limitations notwithstanding, a case can convincingly be made that our current structure impedes the properties assumed to exist in efficient markets. Furthermore, it seems clear that the status quo is not a sustainable option. Ultimately, what is needed is a way to cut through the nonsense. The ACA is well within the legal authority of Congress, so we should focus on whether or not its policy that actually helps the population. We need an objective metric by which to assess the value added by the ACA, and Burke & Kamarck (2013) provide just that. The following is their list of measures we might use to assess performance. We may find them to be incomplete or in need of modification as time goes on, but they are as good an effort as any to evaluate the policy from a dispassionate perspective.

  1. Is there a reduction in the total number of uninsured?
  2. Is there an increase or stabilization in the cost of premiums on the exchanges and in the private market?
  3. Are there an adequate number of plans in the exchange and does the number increase or decrease over time? Are plans exiting or entering the market over time?
  4. Does the number of people who pay the penalty for not having insurance increase or decrease over time?
  5. Is there a decline in employer coverage?
  6. Is there a decline in full-time work and an increase in part-time work?
  7. What is the extent of the conflict between federal and state oversight of health insruance and does it increase or decrease over time?
  8. Is there evidence of an increase or decrease in out-of-pocket expenditures on health care?

I would also add a sort of meta-query: to what extent are we ok with tradeoffs in these goals? The paper expands on each of these, but the idea behind the list comes through readily.

This Notebook has just scratched the surface, but with any luck it can provide a reasonable frame for thinking about the Affordable Care Act, and incite additional questions.

In []: